Creating a Systemic Functional Grammar Corpus from the Penn Treebank
نویسندگان
چکیده
The lack of a large annotated systemic functional grammar (SFG) corpus has posed a significant challenge for the development of the theory. Automating SFG annotation is challenging because the theory uses a minimal constituency model, allocating as much of the work as possible to a set of hierarchically organised features. In this paper we show that despite the unorthodox organisation of SFG, adapting existing resources remains the most practical way to create an annotated corpus. We present and analyse SFGBank, an automated conversion of the Penn Treebank into systemic functional grammar. The corpus is comparable to those available for other linguistic theories, offering many opportunities for new research.
منابع مشابه
Converting the Penn Treebank to Systemic Functional Grammar
Systemic functional linguistics offers a grammar that is semantically organised, so that salient grammatical choices are made explicit. This paper describes the explication of these choices through the conversion of the Penn Treebank into a systemic functional grammar corpus. Developing such a resource can help connect work in natural language processing to a significant body of research dealin...
متن کاملCCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank
This article presents an algorithm for translating the Penn Treebank into a corpus of Combinatory Categorial Grammar (CCG) derivations augmented with local and long-range word–word dependencies. The resulting corpus,CCGbank,includes 99.4% of the sentences in the Penn Treebank. It is available from the Linguistic Data Consortium,and has been used to train widecoverage statistical parsers that ob...
متن کاملCorpus-Oriented Grammar Development for Acquiring a Head-Driven Phrase Structure Grammar from the Penn Treebank
This paper describes a method of semi-automatically acquiring an English HPSG grammar from the Penn Treebank. First, heuristic rules are employed to annotate the treebank with partially-specified derivation trees. Lexical entries are automatically extracted from the annotated corpus by inversely applying schemata to partially-specified derivation trees.
متن کاملInduction of Treebank-Aligned Lexical Resources
By ‘treebank-aligned lexical resources’ we mean ones where there is a systematic correspondence between the lexical resource and treebank syntactic resources. For instance, the lexicon resource contains features representing the subcategorization frames of verbs, which correspond to structural configurations that the verb occurs in, in a treebank. Given such an alignment, a treebank can be comp...
متن کاملParallel Multi-Theory Annotations of Syntactic Structure
We present an approach to creating a treebank of sentences using multiple notations or linguistic theories simultaneously. We illustrate the method by annotating sentences from the Penn Treebank II in three different theories in parallel: the original PTB notation, a Functional Dependency Grammar notation, and a Government and Binding style notation. Sentences annotated with all of these theori...
متن کامل